[Metax][Optimization] Optimize PaddleOCR-VL vision path on Metax GPU#7619
[Metax][Optimization] Optimize PaddleOCR-VL vision path on Metax GPU#7619Dryoung95 wants to merge 5 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
c6b3374 to
162c6f5
Compare
|
目前没有看到编译失败、测试失败或 smoke test 失败的代码级证据,更像是 runner / Jenkins remoting 层问题。 麻烦帮忙 rerun 一下这条检查。 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7619 +/- ##
==========================================
Coverage ? 71.87%
==========================================
Files ? 396
Lines ? 55493
Branches ? 8689
==========================================
Hits ? 39884
Misses ? 12854
Partials ? 2755
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@luotao1 麻烦老师触发一下CI |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览⏳ CI 仍在运行中,4 个 Required 任务尚未完成,暂无失败任务,请等待完成后查看最终结果。
2 任务状态汇总2.1 Required任务 : 4/8 通过
2.2 可选任务 — 17/20 通过
3 失败详情(仅 required)无 required 失败任务。 |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-04-29 13:28:10
📋 Review 摘要
PR 概述:针对 Metax GPU 优化 PaddleOCR-VL 视觉路径,减少 host/device 同步和重复小张量操作,提升并发推理性能。
变更范围:model_executor/models/paddleocr_vl/(projector、siglip、siglip_ops)、worker/metax_model_runner.py、单测
影响面 Tag:[Models] [Metax] [Optimization]
📝 PR 规范检查
标题包含 [Metax] 和 [Optimization] 均为官方 Tag,格式合规;描述包含 Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist 全部必填 section,内容充实,Checklist 勾选状态与 diff 一致。✅ PR 规范合规。
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | siglip.py:131 |
assert 用于运行时 batch=1 校验,Python -O 下失效 |
| 🟡 建议 | metax_model_runner.py:507 |
assert 用于运行时 grid_thw 一致性校验,Python -O 下失效 |
| ❓ 疑问 | siglip_ops.py:40 |
apply_rotary_pos_emb_vision 接口变更为强制 float32,需确认 neox_rope_embedding 已同步 |
总体评价
本 PR 通过批量化投影、融合元数据构造、LFU 位置编码缓存复用、mm_hash 去重等手段系统性地降低了视觉路径开销,思路清晰,单测覆盖完整。两处 assert 用于运行时防御性校验建议替换为 raise ValueError,以避免 Python 优化模式下静默失效;apply_rotary_pos_emb_vision 的 float32 接口约束请作者补充说明 neox_rope_embedding 侧是否同步。
| ): | ||
| B, seq_length, D = hidden_states.shape | ||
| if hidden_states.dim() == 3: | ||
| assert hidden_states.shape[0] == 1, f"SiglipAttention only supports batch=1, got {hidden_states.shape}" |
There was a problem hiding this comment.
🟡 建议 assert 被用于运行时 shape 校验
在 Python -O 模式下 assert 会被完全跳过,导致校验静默失效。
建议改为显式异常:
if hidden_states.shape[0] != 1:
raise ValueError(
f"SiglipAttention only supports batch=1, got {hidden_states.shape}"
)| def apply_rotary_pos_emb_vision(x, cos, sin): | ||
| orig_dtype = x.dtype | ||
| x = x.astype("float32") | ||
| assert x.dtype == paddle.float32, f"expected float32, got {x.dtype}" |
There was a problem hiding this comment.
❓ 疑问 apply_rotary_pos_emb_vision 接口变更:要求调用方保证 float32 输入
native_neox_rope_embedding 已在调用前正确完成 cast。但 SiglipAttention.forward 实际调用的是签名为 (qkv, cos, sin, num_heads, head_dim) 的 neox_rope_embedding(疑为 custom op),其未出现在本 PR diff 中。若该函数内部直接调用 apply_rotary_pos_emb_vision 且不保证 float32 输入,将在运行时触发 assert 失败。
请确认 neox_rope_embedding 已同步处理 float32 保证,或说明其不经过此函数。
| else: | ||
| grid_thw_tensor = paddle.to_tensor(grid_thw_key, dtype=paddle.int64) | ||
| multi_vision_inputs["images_lst"].append( | ||
| paddle.to_tensor( |
There was a problem hiding this comment.
🟡 建议 assert 被用于运行时数据一致性校验
Python -O 模式下会失效,建议改为:
if pending_mm_grid_thw[mm_hash] != grid_thw_key:
raise ValueError(
f"mm_hash {mm_hash} grid_thw mismatch: "
f"{pending_mm_grid_thw[mm_hash]} != {grid_thw_key}"
)
Motivation
This PR optimizes the PaddleOCR-VL vision path on Metax GPU.
During profiling, extra overhead was observed around extract_vision_features_paddleocr(), especially in vision metadata preparation, position embedding preparation, and projector-side data
organization. This PR reduces unnecessary host/device synchronization and repeated small tensor operations while keeping the existing vision computation semantics unchanged.
Modifications
This PR updates the following files:
Main changes:
Usage or Command
Unit test added for this PR:
Local validation commands:
Performance validation on Metax GPU:
Accuracy Tests
This PR keeps the PaddleOCR-VL vision math semantics unchanged and only reduces unnecessary data organization and metadata movement overhead.
Validation performed:
Local test result:
Checklist